Search CORE

36 research outputs found

Stylistic gait synthesis based on hidden Markov models

Author: Alexis Moinet
Joëlle Tilmanne
Thierry Dutoit
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

Mage - Reactive articulatory feature control of HMM-based parametric speech synthesis

Author: Astrinaki Maria
Dutoit Thierry
King Simon
Ling Zhen-Hua
Moinet Alexis
Richmond Korin
Yamagishi Junichi
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we present the integration of articulatory control into MAGE, a framework for realtime and interactive (reactive) parametric speech synthesis using hidden Markov models (HMMs). MAGE is based on the speech synthesis engine from HTS and uses acoustic features (spectrum and f0) to model and synthesize speech. In this work, we replace the standard acoustic models with models combining acoustic and articulatory features, such as tongue, lips and jaw positions. We then use feature-space-switched articulatory-to-acoustic regression matrices to enable us to control the spectral acoustic features by manipulating the articulatory features. Combining this synthesis model with MAGE allows us to interactively and intuitively modify phones synthesized in real time, for example transforming one phone into another, by controlling the configuration of the articulators in a visual display. Index Terms: speech synthesis, reactive, articulators 1

CiteSeerX

Edinburgh Research Explorer

A Comparative Analysis of Pretrained Language Models for Text-to-Speech

Author: Drugman Thomas
Granero-Moya Marcel
Karanasou Penny
Karlapati Sri
Moinet Alexis
Peinelt Nicole
Schnell Bastian
Publication venue
Publication date: 04/09/2023
Field of study

State-of-the-art text-to-speech (TTS) systems have utilized pretrained language models (PLMs) to enhance prosody and create more natural-sounding speech. However, while PLMs have been extensively researched for natural language understanding (NLU), their impact on TTS has been overlooked. In this study, we aim to address this gap by conducting a comparative analysis of different PLMs for two TTS tasks: prosody prediction and pause prediction. Firstly, we trained a prosody prediction model using 15 different PLMs. Our findings revealed a logarithmic relationship between model size and quality, as well as significant performance differences between neutral and expressive prosody. Secondly, we employed PLMs for pause prediction and found that the task was less sensitive to small models. We also identified a strong correlation between our empirical results and the GLUE scores obtained for these language models. To the best of our knowledge, this is the first study of its kind to investigate the impact of different PLMs on TTS.Comment: Accepted for presentation at the 12th ISCA Speech Synthesis Workshop (SSW) in Grenoble, France, from 26th to 28th August 202

arXiv.org e-Print Archive